Data quality control (QC)

Correlation between samples:

Here we show scatterplots comparing expression levels for all genes between the different samples, for i) all controls, ii) all treatment samples and iii) for all samples together.

These plots will only be produced when the total number of samples to compare within a group is less than or equal to 10.

Heatmap and clustering showing correlation between replicates

BROWN: higher correlation; YELLOW: lower

Principal Component Analysis

This is a PCA plot of the count values normalized following the default method and then they are scaled:

Inertia plot exploring variance explained by dimensions

Graphical representation of PCA dimensions. The bars represent the percentage of total variance that summarize each dimension. The line measures the percentage of total variance accumulated in previous dimensions. The color distinguishes between significan or no significant dimensions. Only significant dimensions will be considered in the following plots.

Distribution of Eigenvectors

The eigenvector contains the weights of each gene for the PC. Here are represented the distributions of the weights of each eigenvector. The vertical lines represent the quantiles.

Comparison of significant dimensions significant dimensions

This plot compare the position of samples and their distribution in the significant dimensions. The color differenciate between the control (red) and treat (blue) samples.

Representation of the samples in the two first dimension of PCA

Representation of the samples and the categories of qualitative valiables in the two first dimension of PCA

Hierarchical clustering of individuals using first 19 significant PCA dimensions

PCA representation of 1 and 2 axis with individuals coloured by its cluster membership. The first 19 significant PCA dimensions are used for HCPC

Relationship between HCPC clusters and experiment design

Fisher’s exact test is computed between clusters and experimental treats. Fisher’s exact test P values and FDR are showed.

Representation of correlation and P value of numeric factors and PCA dimensions

None of the factors were significantly associated with any dimension

Representation of R2 and P value of qualitative factors and PCA dimensions

Representation of estimated coordinated from barycentre and P value of qualitative factors and PCA dimensions

Visualizing normalization results

These boxplots show the distributions of count data before and after normalization (shown for normalization method default):

Representation of cpm unfiltered data:

Before normalization:

After normalization:

Count metrics by sample ranks

Sample rank versus total counts

Sample rank is the position a sample holds after sorting by total counts

Statistics of expressed genes

Samples are ranked by total expressed genes. Union of expressed genes represents the cumulative total expressed genes (sum of all genes expressed in any sample up to current sample, expected to increase with sample rank). Intersection of expressed genes represents the cumulative intersection of expressed genes (sum of genes expressed in all samples up to current sample, expected to decrease with sample rank).

Mean count distribution by filter

This plot represents the mean counts distribution per gene, classified by filters

Gene counts variance distribution

Variance of gene counts across samples are represented. Genes with lower variance than selected threshold (dashed grey line) were filtered out.

Samples differences by all counts normalized:

All counts were normalizated by default (see options below) algorithm. This count were scaled by log10 and plotted in a heatmap.

Percentages of reads per sample mapping to the most highly expressed genes

M_16174 M_19255 M_16188 M_16200 M_17208 M_17216 M_19222 M_19229 M_20264 M_20280 M_20291 M_20295 M_20312 M_20319 M_20317 M_20326 M_1273 M_1280 M_1274 M_1298 M_16173 M_16172 M_16181 M_16182 M_16171 M_16175 M_16180 M_16178 M_16186 M_16176 M_16189 M_16179 M_16187 M_16195 M_16183 M_16185 M_16196 M_16190 M_16191 M_16194 M_16192 M_17204 M_16197 M_16198 M_16199 M_16201 M_17203 M_17214 M_17205 M_17206 M_17209 M_17207 M_17210 M_17211 M_17212 M_17213 M_19223 M_19218 M_19219 M_19220 M_19245 M_19230 M_19224 M_19232 M_19227 M_19234 M_19261 M_19235 M_19249 M_19248 M_19236 M_19238 M_19251 M_19252 M_19237 M_19246 M_19247 M_20267 M_19256 M_19257 M_20268 M_20283 M_20284 M_20262 M_20270 M_20273 M_20296 M_20289 M_20276 M_20293 M_20286 M_20292 M_20290 M_20294 M_20299 M_20302 M_20303 M_20304 M_20307 M_20308 M_20309 M_20310 M_20311 M_20318 M_20328 M_20313 M_20315 M_20316 M_20322 M_20327 M_20325 M_20329 M_20331 M_1284 M_1285 M_1286 M_1287 M_1290 M_1292 M_1295 M_1296 M_1299 M_12151 M_12152 M_12154 M_12156 M_13158 M_13160 M_13162 M_13163 M_13164 M_13165 M_13168
ENSG00000274012 7.230 7.721 4.415 7.204 9.061 6.126 6.225 5.359 9.347 5.588 7.025 6.835 7.127 5.651 4.761 8.186 6.752 7.150 7.884 4.381 6.097 7.223 7.746 7.439 7.411 7.044 8.241 7.928 6.924 7.833 10.228 7.705 6.729 9.119 9.156 8.705 7.371 6.911 7.317 9.526 5.577 5.269 13.838 6.579 7.102 6.627 5.403 7.999 5.840 7.714 7.802 9.222 8.847 9.086 9.556 8.132 11.158 7.598 8.778 7.478 5.315 6.034 7.024 6.912 8.030 7.263 7.213 4.749 5.652 5.448 5.434 7.644 7.475 5.551 5.345 6.317 6.183 7.675 3.932 3.826 3.849 5.342 6.342 10.133 6.666 7.013 9.264 5.777 7.663 7.573 6.800 6.321 6.408 6.735 9.044 5.904 7.806 10.442 8.814 6.953 8.204 7.096 5.204 5.739 7.633 7.085 6.188 5.534 6.765 6.619 6.929 7.871 7.720 8.169 5.610 7.956 8.417 6.233 9.142 8.840 5.824 7.298 6.361 4.784 5.405 5.776 8.348 6.818 9.328 7.851 8.076 8.308 5.964
ENSG00000276168 6.372 5.786 5.015 6.974 8.082 5.073 4.999 4.705 7.587 5.582 6.286 6.344 6.025 5.384 3.222 5.875 6.177 5.559 6.578 4.605 4.231 4.922 6.866 4.754 5.780 4.947 5.767 6.378 6.819 6.110 8.076 6.384 5.658 7.002 6.713 8.074 6.772 5.265 5.950 6.396 3.781 4.734 12.154 5.037 5.671 2.716 5.000 6.380 4.594 6.253 7.075 7.963 7.918 6.419 7.979 7.426 9.653 6.206 7.505 6.426 4.693 5.063 5.245 4.469 7.052 5.807 4.476 4.129 5.357 4.329 4.005 6.450 4.012 4.508 5.153 4.555 5.314 6.045 3.269 3.573 4.509 4.848 5.758 8.814 6.931 6.854 7.929 5.754 6.842 7.400 5.271 6.161 5.793 6.728 6.930 5.700 7.154 9.600 7.145 6.022 6.173 5.417 4.211 4.382 6.510 5.789 4.217 3.270 5.121 5.695 6.649 7.585 6.162 3.616 5.329 6.291 6.929 4.901 5.901 7.230 5.351 6.794 5.351 4.621 4.086 5.234 7.482 6.668 8.155 5.017 6.437 7.007 4.707
ENSG00000283293 1.917 1.554 1.357 2.242 2.327 1.565 1.494 1.172 1.895 1.871 2.438 2.130 1.476 1.835 2.157 1.769 1.819 2.264 1.742 1.774 2.202 2.083 2.021 2.307 1.685 1.401 2.239 2.043 1.791 1.940 2.362 1.950 1.554 1.684 2.490 1.966 1.779 1.768 1.393 2.039 1.743 1.150 2.123 1.639 1.307 1.680 1.131 1.921 1.395 1.908 1.775 1.928 2.105 2.221 2.137 2.077 2.626 2.214 2.354 1.612 1.351 1.568 2.233 1.764 2.058 1.275 1.707 1.304 1.313 1.399 1.661 2.194 1.321 1.406 1.364 1.408 1.424 1.748 0.969 1.057 1.533 1.911 1.781 2.201 2.069 2.356 2.176 2.137 1.946 1.697 2.053 1.923 2.153 1.885 2.611 1.833 2.051 2.434 2.743 1.367 2.047 1.838 1.423 1.692 2.186 1.480 1.569 2.630 1.768 1.724 1.791 2.027 2.265 1.835 1.904 2.073 1.912 1.837 2.327 2.019 1.702 2.067 1.858 1.444 1.250 1.831 2.252 1.838 1.947 2.430 1.975 2.104 1.566
ENSG00000251562 1.382 1.428 1.567 1.037 1.061 1.667 1.425 1.373 1.168 1.227 1.153 1.258 1.337 1.496 1.330 1.653 1.565 1.407 1.290 0.389 1.695 1.656 1.213 1.632 1.368 1.591 1.203 1.265 1.171 1.414 1.155 1.280 1.590 1.226 1.170 0.980 1.336 1.189 1.475 1.066 1.772 1.513 0.091 1.732 0.883 1.336 1.095 1.264 1.273 1.158 1.285 1.256 1.210 1.264 1.182 1.107 0.854 1.248 0.965 1.225 1.277 1.422 1.810 1.213 1.247 1.149 1.676 1.123 1.550 1.306 1.488 1.316 0.844 1.490 1.150 1.241 1.223 1.255 1.311 1.474 0.946 1.150 1.684 1.042 1.051 1.271 1.202 0.976 1.413 1.221 1.298 1.351 1.018 1.509 1.218 1.504 1.247 0.817 1.099 1.403 1.223 1.794 1.891 1.574 1.482 1.583 1.422 1.005 1.457 1.239 1.464 1.245 1.163 1.421 1.554 1.433 1.454 1.381 1.284 1.259 1.287 1.454 1.289 1.699 1.564 1.654 1.189 1.173 1.279 1.335 1.212 1.238 1.556
ENSG00000124942 1.347 1.308 1.008 1.262 1.080 1.424 1.126 1.285 0.923 1.024 1.088 1.182 1.030 1.211 1.330 1.222 1.015 0.972 0.975 0.929 0.928 1.313 1.203 0.943 1.423 1.191 1.356 1.169 1.030 1.506 0.963 1.169 1.200 1.297 1.156 1.116 1.285 1.624 1.134 1.200 1.167 1.123 0.443 1.083 1.212 1.384 1.473 1.029 1.390 1.205 0.960 0.895 0.875 1.009 1.162 1.095 1.027 1.297 1.185 1.185 1.393 1.411 1.317 1.586 1.327 1.193 1.490 1.208 1.216 1.492 1.198 1.009 1.093 1.311 0.998 1.248 1.138 1.125 1.622 1.310 0.788 0.968 1.091 1.110 0.912 1.014 1.176 1.047 1.200 0.921 1.014 1.080 0.956 1.110 0.889 1.047 1.092 0.803 0.870 1.051 1.274 1.374 1.533 1.171 1.117 1.138 1.238 0.961 1.137 1.159 1.171 0.898 1.286 1.365 1.261 1.070 0.950 1.025 1.179 1.093 1.030 1.011 1.183 1.153 1.332 1.246 0.889 1.049 0.964 0.888 1.293 1.106 1.316

Details of the input data

First group of samples (to be referred to as control in the rest of the report)

Sample Names:
M_16174
M_19255
M_16188
M_16200
M_17208
M_17216
M_19222
M_19229
M_20264
M_20280
M_20291
M_20295
M_20312
M_20319
M_20317
M_20326
M_1273
M_1280
M_1274
M_1298

Second group of samples (to be referred to as treatment in the rest of the report)

Sample Names:
M_16173
M_16172
M_16181
M_16182
M_16171
M_16175
M_16180
M_16178
M_16186
M_16176
M_16189
M_16179
M_16187
M_16195
M_16183
M_16185
M_16196
M_16190
M_16191
M_16194
M_16192
M_17204
M_16197
M_16198
M_16199
M_16201
M_17203
M_17214
M_17205
M_17206
M_17209
M_17207
M_17210
M_17211
M_17212
M_17213
M_19223
M_19218
M_19219
M_19220
M_19245
M_19230
M_19224
M_19232
M_19227
M_19234
M_19261
M_19235
M_19249
M_19248
M_19236
M_19238
M_19251
M_19252
M_19237
M_19246
M_19247
M_20267
M_19256
M_19257
M_20268
M_20283
M_20284
M_20262
M_20270
M_20273
M_20296
M_20289
M_20276
M_20293
M_20286
M_20292
M_20290
M_20294
M_20299
M_20302
M_20303
M_20304
M_20307
M_20308
M_20309
M_20310
M_20311
M_20318
M_20328
M_20313
M_20315
M_20316
M_20322
M_20327
M_20325
M_20329
M_20331
M_1284
M_1285
M_1286
M_1287
M_1290
M_1292
M_1295
M_1296
M_1299
M_12151
M_12152
M_12154
M_12156
M_13158
M_13160
M_13162
M_13163
M_13164
M_13165
M_13168

Note: A positive log fold change shows higher expression in the treatment group; a negative log fold change represents higher expression in the control group.

DEgenes Hunter results

Gene classification by DEgenes Hunter

DEgenes Hunter uses multiple DE detection packages to analyse all genes in the input count table and labels them accordingly:

  • Filtered out: Genes discarded during the filtering process as showing no or very low expression.
  • Prevalent DEG: Genes considered as differentially expressed (DE) by at least 4 packages, as specified by the minpack_common argument.
  • Possible DEG: Genes considered DE by at least one of the DE detection packages.
  • Not DEG: Genes not considered DE in any package.

This barplot shows the total number of genes passing each stage of analysis - from the total number of genes in the input table of counts, to the genes surviving the expression filter, to the genes detected as DE by one package, to the genes detected by at least 4 packages.

Package DEG detection stats

This is the Venn Diagram of all possible DE genes (DEGs) according to at least on of the DE detection packages employed:

Plot showing variability between different DEG detection methods in terms of logFC calculation

This graph shows logFC calculated (y-axis) for each package (points) and gene (x-axis). Only genes with variability over 0.01 will be plotted. This representation allows to user to observe the behaviour of each DE package and see if one of them has atypical results.

If there are no genes showing sufficient variance in estimated logFC accross methods, no plot will be produced and a warning message will be given.

## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

FDR gene-wise benchmarking

Benchmark of false positive calling:

Boxplot of FDR values among all genes with an FDR <= 0.05 in at least one DE detection package

FDR Volcano Plot showing log 2 fold change vs. FDR

The red horizontal line represents the chosen FDR threshold of 0.05. The black lines represent other values.

Principal Component Analysis

This is a PCA plot of the count values normalized following the default method and then they are scaled:

Inertia plot exploring variance explained by dimensions

Graphical representation of PCA dimensions. The bars represent the percentage of total variance that summarize each dimension. The line measures the percentage of total variance accumulated in previous dimensions. The color distinguishes between significan or no significant dimensions. Only significant dimensions will be considered in the following plots.

Distribution of Eigenvectors

The eigenvector contains the weights of each gene for the PC. Here are represented the distributions of the weights of each eigenvector. The vertical lines represent the quantiles.

Comparison of significant dimensions significant dimensions

This plot compare the position of samples and their distribution in the significant dimensions. The color differenciate between the control (red) and treat (blue) samples.

Representation of the samples in the two first dimension of PCA

Representation of the samples and the categories of qualitative valiables in the two first dimension of PCA

Hierarchical clustering of individuals using first 7 significant PCA dimensions

PCA representation of 1 and 2 axis with individuals coloured by its cluster membership. The first 7 significant PCA dimensions are used for HCPC

Relationship between HCPC clusters and experiment design

Fisher’s exact test is computed between clusters and experimental treats. Fisher’s exact test P values and FDR are showed.

Representation of correlation and P value of numeric factors and PCA dimensions

None of the factors were significantly associated with any dimension

Representation of R2 and P value of qualitative factors and PCA dimensions

None of the factors were significantly associated with any dimension

Representation of estimated coordinated from barycentre and P value of qualitative factors and PCA dimensions

None of the categories were significantly associated with any dimension

The complete results of the DEgenes Hunter differential expression analysis can be found in the “hunter_results_table.txt” file in the Common_results folder

DE detection package specific results

Various plots specific to each package are shown below:

DESeq2 size factor vs. sample rank

The effective library size is the factor used by DESeq2 normalization algorithm for each sample. The effective library size must be dependent of raw library size.

DESeq2 normalization effects:

This plot compares the effective library size with raw library size

The effective library size is the factor used by DESeq2 normalization algorithm for each sample. The effective library size must be dependent of raw library size.

DESeq2 MA plot:

This is the MA plot from DESeq2 package:

In DESeq2, the MA-plot (log ratio versus abundance) shows the log2 fold changes are attributable to a given variable over the mean of normalized counts. Points will be colored red if the adjusted Pvalue is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down.

A table containing the DESeq2 DEGs is provided: in Results_DESeq2/DEgenes_DESEq2.txt

A table containing the DESeq2 normalized counts is provided in Results_DESeq2/Normalized_counts_DESEq2.txt

Differences between samples by PREVALENT DEGs normalized counts:

Counts of prevalent DEGs were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

edgeR MA plot

This is the MA plot from edgeR package:

Differential gene expression data can be visualized as MA-plots (log ratio versus abundance) where each dot represents a gene. The differentially expressed genes are colored red and the non-differentially expressed ones are colored black.

A table containing the edgeR DEGs is provided in Results_edgeR/DEgenes_edgeR.txt

A table containing the edgeR normalized counts is provided in Results_edgeR/Normalized_counts_edgeR.txt

Detailed package results comparation

This is an advanced section in order to compare the output of the packages used to perform data analysis. The data shown here does not necessarilly have any biological implication.

P-value Distributions

Distributions of p-values, unadjusted and adjusted for multiple testing (FDR)

FDR Correlations

Correlations of adjusted p-values, adjusted for multiple testing (FDR) and for log Fold Change.

Values of options used to run DEGenesHunter

First column contains the option names; second column contains the given values for each option in this run.

opt
input_file /mnt/home/users/bio_267_uma/elenarojano/projects/tfms/albaSubiri/data/pseudocounts_table.txt
pseudocounts TRUE
reads 2
count_var_quantile 0
minlibraries 2
filter_type separate
output_files /mnt/home/users/bio_267_uma/elenarojano/projects/tfms/albaSubiri/reboot/results/FIB4.Adv.Fib
p_val_cutoff 0.05
lfc 0.6
modules DE
minpack_common 4
target_file /mnt/home/users/bio_267_uma/elenarojano/projects/tfms/albaSubiri/reboot/TARGETS/FIB4.Adv.Fib_target.txt
model_variables
numerics_as_factors FALSE
string_factors Etnia,Sex,AgeHigher50,Smoker,Hypertension,Dyslipidemia,Diabetes,Obesity,Antihypertensives_HPM,DM2,AntiDM,Quartils.Temp,Quartils.MinTemp,Quartil.MaxTemp,Healthy
numeric_factors
WGCNA_memory 5000
WGCNA_norm_method DESeq2
WGCNA_deepsplit 2
WGCNA_min_genes_cluster 20
WGCNA_detectcutHeight 0.995
WGCNA_mergecutHeight 0.25
WGCNA_all FALSE
WGCNA_blockwiseNetworkType signed
WGCNA_blockwiseTOMType signed
WGCNA_minCoreKME 0.7
WGCNA_minKMEtoStay 0.5
WGCNA_corType pearson
multifactorial
help FALSE